An Asymptotically Tighter Bound on Sampling for Frequent Itemsets Mining
نویسندگان
چکیده
In this paper we present a new error bound on sampling algorithms for frequent itemsets mining. We show that the new bound is asymptotically tighter than the state-of-art bounds, i.e., given the chosen samples, for small enough error probability, the new error bound is roughly half of the existing bounds. Based on the new bound, we give a new approximation algorithm, which is much simpler compared to the existing approximation algorithms, but can also guarantee the worst approximation error with precomputed sample size. We also give an algorithm which can approximate the top-k frequent itemsets with high accuracy and efficiency.
منابع مشابه
Efficient Frequent Itemsets Mining by Sampling
As the first stage for discovering association rules, frequent itemsets mining is an important challenging task for large databases. Sampling provides an efficient way to get approximating answers in much shorter time. Based on the characteristics of frequent itemsets counting, a new bound for sampling is proposed, with which less samples are necessary to achieve the required accuracy and the e...
متن کاملMINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS
This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...
متن کاملUsing and extending itemsets in data mining: query approximation, dense itemsets, and tiles
Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated...
متن کاملMining Frequent Patterns via Pattern Decomposition
• Candidates Generation and Test (Agrawal &Srikant, 1994; Heikki, Toivonen &Verkamo, 1994; Zaki et al., 1997): Starting at k=0, it first generates candidate k+1 itemsets from known frequent k itemsets and then counts the supports of the candidates to determine frequent k+1 itemsets that meet a minimum support requirement. • Sampling Technique (Toivonen, 1996): Uses a sampling method to select a...
متن کاملEfficient Discovery of Association Rules and Frequent Itemsets through Sampling
Discovery of frequent itemsets and association rules is a fundamental computational primitive with application in data mining (market basket analysis), databases (histogram construction), networking (heavy hitters) and more [Han et al., 2007, Sect. 5]. Depending on the particular application, one is interested in finding all itemsets with frequency greater or equal to a user defined threshold (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.08273 شماره
صفحات -
تاریخ انتشار 2016